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Abstract 

Shannon's secrecy system is studied in a setting, where both the legitimate decoder and the 
wiretapper have access to side information sequences correlated to the source, but the wiretap- 
per receives both the coded information and the side information via channels that are more 
noisy than the respective channels of the legitmate decoder, which in turn, also shares a secret 
key with the encoder. A single-letter characterization is provided for the achievable region in 
the space of five figures of merit: the equivocation at the wiretapper, the key rate, the distortion 
of the source reconstruction at the legitimate receiver, the bandwidth expansion factor of the 
coded channels, and the average transmission cost (generalized power). Beyond the fact that 
this is an extension of earlier studies, it also provides a framework for studying fundamental 
performance limits of systematic codes in the presence of a wiretap channel. The best achievable 
performance of systematic codes is then compared to that of a general code in several respects, 
and a few examples are given. 

Index Terms: wiretap channel, encryption, Shannon's cipher system, separation theorem, 
systematic codes. 
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1 Introduction 



Wyner, in his well-known paper on the wiretap channel [llj . studied the problem of secure com- 
munication across a degraded broadcast channel, without using a secret key, where the legitimate 
receiver has access to the output of the good channel and the wiretapper receives the output of 
the bad channel. In that paper, Wyner characterized the optimum trade-off between reliable cod- 
ing rates and the equivocation at the wiretapper, which was defined in terms of the conditional 
entropy of the source given the output of the bad channel, observed by the wire-tapper. Among 
other things, Wyner establised and characterized, in the same paper, the notion of the secrecy 
capacity, which is the maximum coding rate that still allows full secrecy, where the equivocation 
is equal to the (unconditional) entropy of the source, thus rendering the information available to 
the wiretapper, virtually useless for learning anything about the source. By applying good codes 
at rates close to the secrecy capacity, the channel is fully exploited in the sense that the "excess 
noise", that is sufferred at the bad channel output (beyond the noise at the good channel output), 
plays the role of securing the message with maximum efficiency. The idea behind the construction 
of a good code for the wiretapped channel is essentially similar to the idea of binning. One creates 
a relatively large code, which is reliably decodable at the legitimate receiver, and which is thought 
of as an hierarchy of randomized sub-codes, each of which being reliably decodable individually 
by the wiretapper. However, the bits that are decodable by the wiretapper are only those of the 
randomization, and thus carry information that is irrelevant with regard to the source. 

Throughout the three decades that have passed since pjj was published, the results of that 
paper have been extended in quite many directions, and we mention here only a few. Csiszar and 
Korner [3] have generalized Wyner's setting to a broadcast channel that is not necessarily degraded 
(allowing also a common message to both receivers). Very shortly afterwards, Leung- Yan-Cheong 
and Hellman [4j, studied the Gaussian wiretap channel, and have shown, among other things that 
its secrecy capacity is simply the difference between the capacities of the main (legitimate) channel 
and the wiretap channel. In [8], Ozarow and Wyner studied another model, referred to as the type 
II wiretap channel, where the main communication channel is noiseless, but the wiretapper has 
access to a subset of the coded bits, and optimal tradeoffs were characterized. In [T3], the wiretap 
channel model was extended to have two parallel broadcast channels, connecting one encoder and 
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one legitmate decoder, where both channels are wiretapped by non-collaborating wiretappers, and 
again, optimum tradeoffs where given in terms of single-letter expressions. In p3], the scope of 
was extended in two ways: First, by allowing a secret key to be shared between the encoder and 
the legitimate receiver, and secondly, by allowing a certain distortion in the reconstruction of the 
source at the legitimate receiver. The main coding theorem of p3] suggests a separation principle, 
which asserts that no asymptotic optimality is lost if the encoder, first, applies a rate-distortion 
source code, then encrypts the compressed bits, and finally, applies a good code for the wiretap 
channel. More recently, the Gaussian wiretap channel model of [3] was further extended in two 
directions: one is the Gaussian multiple access wiretap channel of |10j . and the other is Gaussian 
intereference wiretap channel of [6], [7], where the encoder has access to the interference signal as 
side information, similarly as in Costa's dirty paper channel pQ. 

In this paper, we extend the setting of the wiretap channel in a different direction. For simplicity, 
we adopt the structure of a degraded broadcast channel, as in [11] (though it is plausible that the 
results are generalizable to more general broadcast channels), and similarly as in [13], we allow a 
secret key shared between the encoder and the authorized decoder, as well as lossy reconstruction 
of the source within a prescribed distortion level, but we, moreover, allow also side informations, 
correlated to the source, to be available both to the legitimate decoder and the wiretapper. We 
assume that the wiretapper receives its side information via a channel that is degraded relative 
to the side information channel of the innocent decoder (see Fig. [1]). Our main result is a single- 
letter characterization of the optimum tradeoff among five figures of merit: the equivocation at 
the wiretapper, the distortion level in reconstructing the source at the authorized decoder, the 
bandwidth expansion factor of the coded channels, the rate of the secret key relative to the source, 
and the average tranmission cost. 

One of the motivations for this study is that it establishes a framework for deriving performance 
limits of systematic codes for wiretapped channels and assessing their loss in performance compared 
to general codes (as was done in [9] in a different context): The side information channels (Py\u 
and P\y\v m Fig. H]) can be thought of as conveying the systematic (uncoded) part of the codeword. 
We compare the best achievable performance of systematic codes to that of general codes at the 
same coding rates, in several aspects, like the maximum achievable equivocation in the absence 
of a secret key, the maximum achievable equivocation in the presence of a full-rate key, the key 
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Figure 1: The wiretap channel with side information at the receivers. 

rate needed to achieve the maximum achievable equivocation, and the distortion achieved when 
the channel is utilized at rate close to the secrecy capacity. A few examples are given for situations 
where systematic codes are as good as (and sometimes even better than) general codes. 

The outline of the remaining parts of this paper is as follows: In Section 2, we set up the 
notation, formulate the problem, present the main result, and make a few comments. In Section 3, 
we discuss the implications on systematic coding, and we make comparisons with general codes, as 
described in the previous paragraph. In Section 4, we prove the converse part of the main result, 
and finally, in Section 5, we prove the direct part. 



2 Problem Formulation and Main Result 



We begin by establishing some notation conventions. Throughout this paper, scalar random vari- 
ables (RV's) will be denoted by capital letters, their sample values will be denoted by the respective 
lower case letters, and their alphabets will be denoted by the respective calligraphic letters. A sim- 
ilar convention will apply to random vectors and their sample values, which will be denoted with 
same symbols superscripted by the dimension, or by the bold face font, if there is no room for 
confusion regarding the dimension. Thus, for example, U N (A - positive integer) or U will denote 
a random A-vector (U\, Un), and u N = (u±, ...,«jv) is a specific vector value in U N , the A-th 
Cartesian power of U. 

Sources and channels will be denoted generically by the letter P, subscripted by the name of 
the RV and its conditioning, if applicable, e.g., Pu(u) is the probability function of U at the point 
U = u, Py\x{y\ x ) is the conditional probability of Y = y given X = x, and so on. Whenever clear 
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from the context, these subscripts will be omitted. Information theoretic quantities like entropies 
and mutual informations will be denoted following the usual conventions of the Information Theory 
literature, e.g., H(U N ), I(X n ;Y n ), and so on. For single-letter information quantities (i.e., when 
n = 1 or N = 1), subscripts will be omitted, e.g., H(U V ) = H(U\) will be denoted by H(U), 
similarly, I(X 1 ; Y 1 ) = I(X±; Y\) will be denoted by I(X; Y), and so on. For three random variables, 
generically denoted A, B, and C, the notation A Q B Q C will designate the fact that they form, 
in this order, a Markov chain. The extension of this notation to longer Markov chains will be 
straightforward. The cardinality of a finite set A will be denoted by \A\. The notation [a} + will 
stand for max{0, a}. Finally, for a, b € {0, 1}, a© b will denote the modulo 2 sum (XOR) of a and 6, 
and for two general positive integers, a and b, the notation a © 6 will designate the positive integer 
whose binary representation is given by the bit-wise modulo 2 sum of the corresponding bits of the 
binary representations of a and b. 

We now turn to the formal description of the model and the problem setting. A source Pjj 
generates a sequence of N (N - positive integer) independent copies, U N = (*7i,...,f7jv), of a finite- 
alphabet RV, U € U. At the same time, a discrete memoryless channel (DMC), symbolized by Py\u 
generates from U N , another N- vector V N = (V\, . . . , Vjv), with components in a finite-alphabet V, 
and another DMC, denoted Pw\v-> produces from V N , yet another A^-vector W N = (W\, . . . , Wn), 
with components in a finite-alphabet W. Thus, the joint probability distribution of (u N ,v N ,w N ) 
is given by 

N 

P u n(u n )P v n\ u n(v n \u n )P w n\ v n(w n \v n ) = Y\\Puiui)P v \ u {vi\u i )P w \ v {w i \vi)\. 

i=i 

At the same time and independently, another source Pk, henceforth referred to as the key source, 
generates a random variable (or vector) K taking values in a finite alphabet /C. 

Two additional cascaded DMC's operate at a bandwidth expansion factor oin/N channel uses 
per source symbol. This means that during the time that the source generates a block U N of N 
symbols, the first channel receives a block X n of n channel input symbols taking on values in a 
finite alphabet X, and outputs a block Y n of n channel output symbols in a finite alphabet y, 
according to 

n 

P Y n {xn (y n \x n ) = Y[Py\ x (Vj\xj), 
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whereas the second DMC receives Y n as an input vector and outputs a block Z n of n channel 
output symbols in a finite alphabet Z, according to 

n 

Pz-\Y^ n \y n ) = ll p z\y( z M- 

Given N and n, a block encoder is a mapping / n> jv : U N x /C — > A" n , whose output is 
X n = (Xi,...,X„) = f n ,N{U N ,K) E Af n . The channel input vector should satisfy an average 
transmission cost (generalized power) constraint: 

n 

-Y,E{cP(X,)}<Q, (1) 

i=i 

where <j) : A" — ► IR + is the generalized power function and Q is a given positive real. The corre- 
sponding block decoder (of the authorized party) is a mapping g n ^ : y n x x /C — > W , whose 
output is £7^ = (C7i , ...,Uff) = g n: N{Y n , V , K) E W^, where Z^/ is the reproduction alphabet of 
the decoder output symbols. 

Let d : U x U — > IR + denote a single-letter distortion measure between source symbols and 
reproduction symbols, and let the distortion between the vectors, u N E IA N and u N E IA N , be 
defined additively across the corresponding components, as usual. Let Ru\y{D) denote the Wyner- 
Ziv rate-distortion function |12j of the soure U with respect to the distortion measure d, and a 
decoder side information V, i.e., 

R ulv (D) = mi[I(U;A)-I(V;A)), 

where the infimum is over all RV's A with alphabet size |W| + 1, that form a Markov chain AQllQV 
and that satisfy mm^.^ xV _^yy E{d(U, ip(A, V))} < D. Given the degraded broadcast channel 
Py z\x{y ■> z \ x ) = Py\x{v\ x )Pz\y{ z \v) i we wm a l so define the function 

T(r,q)= sup I(X;Y\Z) = sup [I(X; Y) - I(X; Z)] (2) 

{P X : I(X;Y)>r, E<f>(X)<q} {P x : I(X;Y)>r, E<j>(X)<q} 

which is similar to Wyner's T function [11], but with the additional generalized power constraint. 

An (iV, n, A, D, A, R, Q) codec is an encoder-decoder pair with parameters N and n, that sat- 
isfies the following requirements: 

1. The bandwidth expansion factor is n/N < A. 
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2. The expected distortion between the source and the reproduction satisfies 

N 

^E{d(Ui,Ui)}<ND. (3) 

i=l 

3. The equivocation of the message source satisfies 

H(U N \W N ,Z n ) > NA. (4) 

4. The rate of the secret key is H(K)/N < R. 

5. The generalized transmission power satisfies Y^=i E{4>(Xj)} < nQ. 

A quintuple (A, D, A, R, Q) is said to be achievable if for every e > 0, there is a sufficiently large 
./V and n for which (TV, n, A + e, D + e, A — e, R + e, Q + e) codecs exist. The achievable region of 
quintuples {(A, D, A, R, Q)} is the set of all achievable quintuples (A, D, A, R, Q). 

The following theorem characterizes the region of achievable quintuples (A, D, A, R, Q). 
Theorem 1 A quintuple (A, D, A, R, Q) is achievable iff 



A < A*(A, R, D, Q) = H{U\W) 



Br\viD) - XV [ Ru ^ D \ q) - B 



Discussion: A few comments are in order at this point. 

As mentioned in the Introduction, Theorem 1 generalizes earlier results reported in [11], [9], and 
[14j . The generalization relative to [14(, Theorem 1, "Case of LDBC"] is primarily in the presence 
of side informations at the authorized decoder as well as the wiretapper. It should be also noted 
that in [14], there is no full proof of the direct part, but only an intuitive argument. Here, we 
provide complete proofs for both the converse part and the direct part, which are both based on 
the corresponding proofs in [llj . but there are a few twists that are necessary in order to incorporate 
the secret key, K, the side informations, V N and W N , and the generalized power constraint. For 
example, one of the additional ingredients in the proof of the direct part, that is not present in 
the direct part of [11], is that we need to show that the key K can be estimated reliably from U N , 
W N , and Z n , so that H(K\U N , W N , Z n ) is small. 

As in [13], Theorem 1 here suggests a separation principle, that guarantees no loss in asymp- 
totically optimum performance, if one separates source coding, encryption, and channel coding. As 



7 



will be seen in the proof of the direct part, the proposed achievability scheme consists of Wyner-Ziv 
rate-distortion source coding, followed by encryption of the compressed bits, followed in turn by 
good channel coding for the wiretapped channel, as in As is demonstrated in [5], the separation 
principle does not always hold in situations that involve source coding, encryption, and channel 
coding. 

A few words about the intuition behind the achievable upper bound on the equivocation, 
A*(A, R, D, Q): For R > R v \y(D) — XT(R u ^y(D)/X,Q), there is enough randomness to achieve 
the maximum possible secrecy of H(U N \W N ) = NH(U\W), which cannot be exceeded even if 
the wiretapper did not have access to Z n . For the more interesting case where Rjj\y(D) > 
Xr(Rjjiy(D)/\, Q) (which in turn means that Rjjiy(D)/X is above the secrecy capacity), and 
R < Rjj\y{D) — XT(Rjj\y(D)/X,Q), we can express A* (A, R, D, Q) as the sum of four terms: 

A*(A, R, D, Q) = [H(U\W) - H(U\V)] + [H(U\V) - R U]V (D)] + Ar ( ^^1 ,q\ + R, 

where we have added and subtracted H(U\V). Now, the first bracketed term designates the fact 
that the wiretapper has side information whose quality is lower than that of the authorized user, a 
fact which contributes to the equivocation. The second bracketed term designates uncertainty due 
to the information loss at the source encoder (although a general coding scheme may not necessarily 
use a source encoder explicitly). Out of the N R v \y{D) bits of the description of the source, NR 
bits are covered by the key and another nT{R u \ [V {D) / X^ Q) = NXT(Rjj^y(D)/X, Q) bits are covered 
by good channel coding for the wiretapped channel, as in [11 j . |14j . In designing a good coding 
scheme, it should be kept in mind then, that there should be no overlap between the set of bits 
encrypted by the key and those that are "hidden" by coding. It is interesting to note that in 
the above decomposition of A*(A, R, D, Q), the first term depends solely on the joint distribution 
of (U, V,W), and not on any other factor of the problem, the second term depends only on the 
joint distribution of (U, V) and the allowed distortion (but no longer on the joint distribution with 
W), and the third term depends also the coded channels. Referring to the previous comment, 
it is interesting to note that even in the lossless case (D = 0) and even if the coded channels 
are clean (i.e., X n = Y n = Z n with probability one), the presence of side information at the 
legitimate decoder, which is of better quality than the one at the wiretapper, gives rise to "inherent 
secrecy," that is present even without a secret key. In such a case, the last three terms in the 
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above representation of A* (A, R, D, Q) all vanish, but the first term is still positive. For example, a 
Slepian-Wolf encoder for a source U and side information V, which is based on random binning, has 
the maximum achievable inherent secrecy of I(U ; V) bits/symbol if a wiretapper that observes the 
compressed bits has no side information. This is in contrast to the case without side information, 
where there is no inherent secrecy at all. 

An interesting question that arises is about optimum strategies and performance limits if one is 
interested to maximize the equivocation of U N instead of, or in addition to that of U (see also [5] ) , 
which is reasonable because it is U N that is the information conveyed from the source. In contrast 
to [5], where the problem was fully solved using ordinary rate-distortion coding considerations, 
here, because of the presence of side information, the problem remains open. 

Finally, as mentioned already in the Abstract and the Introduction, Theorem 1 provides a 
framework for studying the fundamental performance limits of systematic (not necessarily linear) 
codes, in the same manner as in [9], for the wiretap channel. The next section is devoted to such 
a study. 

3 Systematic Vs. Non— Systematic Codes 

If U = X, V = y, and the uncoded channel, Pv\Ui ls understood as an additional use of the 
same physical channel as the coded channel, Py\x-> an d if E{4>(U)} < Q, then the uncoded path 
U N — > V N may be thought of as corresponding to the transmission and reception of the systematic 
(uncoded) part of a systematic code, where the information symbols are sent directly to the channel. 
The total bandwidth expansion factor of this systematic code, when the uncoded part is viewed as 
part of the code, is then (N + n)/N = 1 + A, assuming that n/N = A. For a fully coded (general, 
non-systematic) system with the same bandwidth expansion factor, we can use the formula of 
A*(A, R, D,Q), but replace A by 1 + A and eliminate the side informations, V N and W N . The 
resulting maximum achievable equivocation of a general code, is therefore: 

A* gen (\,R,D,Q)=H(U)- 

where Ru(D) is the ordinary rate-distortion function of U (without side information), and we are 
interested to compare this to the original expression of A*(A, R, D, Q), given in Theorem 1, which 



Ru(D)-(l+X)T 



RuiP) 
V 1 + A 



,Q 



R 



(5) 
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will be denoted by Agy S (A, R, D, Q) throughout this section. Quite obviously, Agy S (A, R, D, Q) 
cannot exceed Ag en (A, R, D, Q), but it is interesting to identify cases of equality, simply by com- 
paring the two expressions. We will, however, focus here on a few specfic aspects of comparison 
between optimum systematic codes and optimum general codes: 

1. The full equivocation, that is, the maximum equivocation that can be achieved in the absence 
of limitations on the key rate (in which case, the bracketed term of A* vanishes). 

2. The zero key-rate equivocation, which is defined as A* for R = 0. This quantity manifests 
the "inherent" security that is already present in the system even without a key. It should 
be noted that whenever Agy S (A, 0, D, Q) = Ag en (A, 0, D, Q), then, in general (as can be 
seen from the expressions of Agy S (A, R, D, Q) and Ag en (A, R, D, Q)), there is a range of R, 
where Agy S (A, R, D, Q) = Ag en (A, R, D, Q) since, in that range, both Agy S (A, R, D, Q) and 
Ag en (A, R, D, Q) grow linearly with a slope of 45 degrees, starting from their respective values 
at R = 0. 

3. The saturation key rate, which is the smallest value of R, for which A* achieves the full 
equivocation. When the saturation key rate is small, then so are the randomization resources 
required. 

4. The secrecy distortion, which is the value of D for which the channel coding rate equals 
the secrecy capacity, in other words, the first argument of the function V agrees with the 
secrecy capacity. This is an interesting working point, because it is the point where the full 
equivocation is achieved without using a key at all. In other words, using the terminology 
that we have already defined, the zero key-rate equivocation is equal to the full equivocation, 
and the saturation key rate vanishes. 

While under the first two criteria, systematic codes can never be strictly better than general 
codes, this is not necessarily the case with the last two criteria, because codes that are optimum 
in the maximum equivocation sense may be suboptimal under other criteria. We next compare 
optimum systematic codes to optimum codes from the above four aspects. 
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1. The full equivocation: Obviously, this quantity is H(U\W) for systematic codes and H(U) 
for general codes, thus the difference, I(U; W), depends only on the joint distribution of U and W. 
In this respect, optimum systematic codes are as good as optimum general codes only if the side 
information W is independent of U and hence useless. 



2. The zero key rate equivocation: For R = 0, we have 



A* gen (X,0,D,Q) = H(U)- 
for general codes, and 

A* sys (X,0,D,Q)=H(U\W) 



Ru(D)-(1 + X)t(^,Q 



R n] (D) - Al ( RulV x iD) ,Q 



(6) 



(7) 



for systematic codes. Let us assume that the bracketed terms in both expressions are positive (oth- 
erwise, we are back to the comparison of the previous paragraph). Comparing the two expressions, 
we see that equality is achieved if 

(1 + A)r 0) - XT f Ru ^ D \ q\ = Rjj(D) - R U{V (D) - I(U; W). (8) 

As is shown in [9j eqs. (2.12), (2.13)], the difference Rjj(D) — Rjj\ v (D) is never larger than /(£/; V), 
but there are cases of equality, most notably, the lossless case D = 0, as i?c/(0) = H{U) and 
Ru\y(0) = H(U\V)a Thus, at least in the lossless case, eq. (jHJ) boils down to 



(9) 



Now, in quite a few examples of interest, T(r,q) is equal to a constant, Tq, throughout the entire 
interesting range of r. One such example occurs when q = oo (i.e., no generalized power constraint), 
Py\x is the noiseless binary channel and Pz\y is a binary symmetric channel (BSC) with crossover 
probability po (cf. |llt p. 1362]), in which case, Tq = h(po), where h(-) is the binary entropy 
function. In this case, the left-hand side of eq. Q becomes h(po) independently of A. Now, if Pym 
has the same characteristics as Py\Xi an d similatry P\y\v has the same characteristics as Pz\y 
(which is indeed the case in systematic coding applications), and if P\j is the binary symmetric 



1 Another example is the Gaussian source U, the Gaussian channel Py\v, an d the squared error distortion measure, 
where Ru(D)~R ulv (D) = [h(U)-\\og{2%eD)]-[h(U\V)-\\og{2-neD)] = I(U; V) throughout the entire interesting 
range of distortion levels. 
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source (BSS), then it achieves the maximum of I(U ; V) — I(U; W), which is, again, Tq = h(po). In 
this case, therefore, the equality ([9]) is achieved. Similarly, if Py\x ls noiseless as before, but Pz\y ls 
an erasure channel with erasure probability pq, then Tq = po, and once again, equality is achieved if 
Pjj is the BSS. Yet another example of this type occurs when both Py\x an d Pz\Y are independent 
Gaussian channels with an input power constraint defined in terms of cj>(x) = x 2 (and hence, so are 
Py\u an d P\v\v)- in this case, as was shown in \4\, To = Cx-^Y — Cx^z, the difference between 
the capacities of the channels Py\x an d Pz\x ■ Here, equality in (|9|) is achieved if U is a zero-mean 
Gaussian random variable whose variance coincides with the maximum allowable input power, Q. 
Thus, we have demonstrated a few non-trivial examples where optimum systematic codes are as 
good as optimum codes in the absence of a secret key. 

3. The saturation key rate: Here, we obtain 

R* gen = R V {D) - (1 + A)r (y^, <?) (10) 

for general codes, and 

*sys = Ru\v(D) - XT ( ^ U]V x {D \ q) (11) 

for systematic codes. The condition for having a smaller saturation key rate for systematic codes 
is 

(1 + A)r ( M^l, - XT ( ?*¥^ ,q\ < Ru{D) - R ulv (D), (12) 

namely, the comparison is similar to the one made with regard to the zero key-rate equivocation 
criterion, but without the term I(U; W). As we have previously shown examples of equality, even in 
the presence of the term I{U ; W), then the same examples can serve now for the desired inequality 
in the absence of this term. In these examples, as well as in many others, optimum systematic 
codes are advantageous over optimum codes in general. 

4. The secrecy distortion: As mentioned earlier, Wyner [11] has established the notion of the 
secrecy capacity, C s , which is the maximum coding rate for which full secrecy is still achieved even 
without a key. Here we ask how do systematic- and non-systematic codes compare in terms of the 
distortion, D, for which the rate of the channel code meets the secrecy capacity. For non-systematic 
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codes, this distortion level is given by the solution to the equation 
which is 

D* gen = D u ((l + X)C s ), (14) 

where Djj{-) is the ordinary distortion-rate function of U (without side information). For systematic 
coding, on the other hand, it is the solution to the equation 

R -^=C, (15) 

which is 

Dt ys = D ulv (XC s ), (16) 

where Djj\ v {-) is the Wyner-Ziv distortion-rate function of U with side information V. The answer 
to the question: which class of codes is better in terms of the secrecy distortion, depends on the 
parameters of the problem. One simple extreme example pertains to the case C s = (which 
happens, e.g., when the channel Pz\y is clean and hence Z n = Y n with probability one). In this 

case, 

£>sys = A/|v(0) = min^ E{d(U,^(V))} (17) 



is clearly smaller than 



^gen = Du(0) = min E{d(U, «)}. (18) 



While the case where C s is strictly zero, clearly trivializes the whole problem altogether, it is, of 
course, conceivable that for small enough positive values of C s , continuity arguments imply that 
systematic codes still outperform non-systematic codes in the secrecy distortion sense. 

As a somewhat less trivial example, consider the case where U is zero-mean, Gaussian, with 
variance afj, the channels are Gaussian and independent, and d is the squared error criterion. Then, 

D* g en = *u ■ 2- 2 ( 1+A >( c *^- c *-*>, (19) 

whereas 

-^sys — a u\v z ' 
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where &\j\y is the minimum mean squared error associated with optimum (linear) estimation of U 
based on V. Thus, -Dgys — ^gen whenever 

C x ^y - C x ^ z < I log ^ = i log ( 1 + 4 ), (21) 
2 a ulv 2 V <* J 

where a 1 is the variance of the noise of the (Gaussian) channel from U to V. Note that the 
dependence upon A disappeared. The last inequality is clearly met if, for example, the channel 
Py\x is the same as the channel Py\u an d crfj = Q (in which case, the right-hand side becomes 
Cx^y)- 

Note that in this aspect of the secrecy distortion, our comparison between systematic codes 
and non-systematic codes is of the same spirit as in [9], in the sense that both are about equating 
rate-distortion functions to capacities. The only difference is that here, as opposed to [SJ, C s 
replaces Cx—>y in the these equations (as there is only one coded channel and one uncoded channel 
in [9]). Obviously, in the comparisons carried out in [9], systematic codes can never outperform 
non-systematic codes. By contrast, as we have seen here, when the secrecy capacity is the working 
point, this becomes possible. 

Finally, one more comment is in order regarding systematic codes: In a real systematic code for the 
wiretap channel, there is, in principle, the freedom to use part of the secret key in order to encrypt 
the systematic symbols as well. This freedom has not been exploited thus far, and the question is 
whether there is any advantage in doing so. Suppose that the source U is binary and the key rate 
is R bits per source symbol (R < 1). Consider the following coding scheme. We select < R' < R, 
and for each block U , we use NR' key bits to encrypt the systematic part and N(R — R') key bits 
to encrypt the Wyner-Ziv rate-distortion codeword before it is fed into the channel encoder of [TT] 
(see also the proof of the direct part in Section 5). Then, by a slight extension of the analysis in 
Section 5 to follow, the resulting equivocation is essentially 

A « R'H{U) + (1 - R')H{U\W) + {R- R') + XT(R ulv (D)/X, Q) - R V \ V (D). (22) 

Since the coefficient of R', in this expression, is I(U; W) — 1 < 0, the best choice of R', in this 
example, is R' = 0, namely, secret key bits should better not be used for encrypting the systematic 
bits, but only the coded bits, as we assumed thus far. 
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4 Proof of the Converse Part of Theorem 1 



Let an (N, n, A + e, D + e, A — e, R + e, Q + e) codec be given. Consider first the following chain of 
inequalities, which will be used later on. 

(a) 

I(X n ;Y n \K,V N ,W N ) > I(U N ;Y n \K,V N ,W N ) 

N 

= ^I^i-^K^ ,W N ,IP- X ) 
i=l 
N 

= ^[H{Ui\K, V N , W N , U 1 " 1 ) - H(Ui\Y n , K, V N , W N , U 1 ' 1 )] 

i=l 
(b) * 

i=l 
( ) N 

i=l 
N 

= ^HUiiAilVi) 

i=l 
N 

i=l 

i=l 
N 

= A) -J(Vi; ^)] 

i=i 

(e) 

> R u\v( E d(Ui, [g n ,N(Ai, Vi)]i)) 
i=i 



(f) / l N 



> 



1=1 



> NRu\ v {D + e), (23) 

where (a) follows from the fact that C/^ G (if, V^, W^, X n ) QY n is a Markov chain, (b) is because 
conditioning reduces entropy, in (c) - is defined as (Y n , K, V l ~~ l , V^), (d) is because SiQUiQVi is 
a Markov chain, (e) is by definition of the Wyner-Ziv rate-distortion function, where [g n ,N{Ai, Vj)\i 
is the projection of g n ,N(Ai,Vi) = g n ^(Y n ,V N ,K) to the i-th component, (f) is due to the 



15 



convexity of the Wyner-Ziv rate-distortion function |12|.[2j Lemma 14.9.1, p. 439], and (g) is due 
to its monotonicity, and the hypothesis that the codec achieves distortion D + e. 

We next derive two upper bounds on A. The first one is trivial: 
and so, 

A < H(U\W) (25) 

due to the arbitrariness of e. The other, more interesting, upper bound on A is obtained as follows: 
First, we observe that 

N{A - e) < H(U N \W N , Z n ) = I(U N ; V N , K\W N , Z n ) + H(U N \V N , K, W N , Z n ). (26) 

Next, we bound from above each one of the terms on the right-most side. As for the first term, we 
have 

I{U N ;V N ,K\W N ,Z n ) = I(U N ;K\W N ,Z n )+I(U N ;V N \K,W N ,Z n ) 

< H(K\W N , Z n ) + H(V N \K, W N , Z n ) - H{V N \U N , K, W N , Z n ) 

< H(K) + H(V N \W N )-H(V N \U N ,W N ) 

< N(R + e) + NI(U;V\W) 

= N[R + I(U;V\W) + e], (27) 

where in the second inequality we have used the fact that V N © (U N , W N ) © (Z n , K) is a Markov 
chain. As for the second term on the r.h.s. of (|26p . we have: 

H(U N \V N ,K,W N ,Z n ) = H(U N \W N )-I{U N ;V N ,K,Z n \W N ) 

= NH(U\W) - I{U N ;V N \W N ) - I{U N ; K, Z n \W N ,V N ) 
= N[H{U\W) - I(U; V\W)] - I(U N ; K, Z n \W N , V N ) 
= NH(U\V,W) - I(U N ;K,Y n \W N ,V N ) + 

[I{U N ;K,Y n \W N ,V N ) - I(U N ;K,Z n \W N ,V N )]. (28) 

We proceed by deriving a lower bound to I(U N ;K, Y n \W , V N ) and an upper bound to the 
bracketed term in the last expression. As for the former, we have: 

I(U N ; K, Y n \W N , V N ) > I(U N ; Y n \W N , V N , K) > NR mv (D + e), (29) 
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where the second inequality has been proven above (compare the right-hand side of the first line 
of eq. ([23]) with the right-most side of that equation). As for the upper bound to the bracketed 
term of the right -most side of (j28|) . we have: 

I(U N ; K, Y n \W N , V N ) - I{U N - K, Z n \W N , V N ) 
® I(U N ; K, Y n , W N , V N ) - I(U N ; K, Z n , W N , V N ) 
ffl I(U N ,K;K,Y n ,W N ,V N ) - I(U N , K; K, Z n ,W N ,V N ) 
® I(U N , K, X n ; K, Y n , W N , V N ) - I(U N , K, X n ; K, Z n , W N , V N ) 
® I(U N ,K,X n ;Y n \K,W N ,V N ) - I(U N ,K,X n ; Z n \K,W N ,V N ) 
fc? I(U N ,X n ;Y n \K,W N ,V N ) - I{U N ,X n ; Z n \K,W N ,V N ) 
= I(X n ; Y n \K, W N , V N ) - I(X n ; Z n \K, W N , V N ) 
© H(Y n \K,W N ,V N )-H(Z n \K,W N ,V N ) + 

H(Z n \X n , K, W N , V N ) - H(Y n \X n , K, W N , V N ) 

n 

= ^^[H (YilY^ 1 , K , W N , V N ) - H{Z i \Z i ~\K, W N , V N ) + 
i=l 

H(Zi\Xi, K, W N , V N ) - H(Yi\Xi, K, W N , V N )} 
(g) n 

< ^2[H(Yi\Y^ l ,K, W N , V N ) - H(Zi\Z i ~ 1 , Y l_1 , K, W N , V N ) + 

8=1 

H(Zi\Xi, K, W N ,V N ) - Hpi\Xi, K, W N , V N )} 
ffl Y^HpilY*- 1 ,^ W N , V N ) - H{Z i \Y i ~\ K, W N , V N ) + 

i=l 

H(Zi\Xi, Y { -\K, W N , V N ) - H(Yi\Xi, Y { -\K, W N , V N )] 

n 

= ]T[/(X t ; Y t \Y l -\K, W N , V N ) - I(X t ; Z^ 1 , K, W N , V N )} 
i=l 

n 

= Y^[ H ( x i\ z i> Y l ~\ K ^ W N , V N ) - HiXilYi, Y'-\K, W N , V N )} 

8=1 
/.\ n 

l = J Y}H{Xi\Z h Y*-\K, W N , V N ) - HiXtlYi, Z U Y*~\K, W N , V N )\ 

8=1 
n 

= ^/(X^I^Y^iW^O (30) 

8=1 

where (a) is by adding and subtracting I{U N ; V N , W N ), (b) is by adding I(K; K, Y n , W N , V N \U N ) = 
H{K\U N ) and subtracting I(K; K, Z n , W N , V N \U N ) = H(K\U N ), (c) is by the fact that X n 
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is a function of U N and K, (d) is by adding and subtracting I(U N , K, X n ; K, W N , V ), (e) 
is by the fact that K is degenerate as it appears in the conditioning, (f) is by the fact that 
U N (X n ,K, W , V N ) Y n Z n is a Markov chain, (g) is because conditioning reduces entropy, 
(h) is because Z*" 1 {Y l - l ,K, W N , V N ) Z t and ^ Y, (X h K, V N , W N ) Y^ 1 are Markov 
chains, and (i) is because Xi (Yj, Y l ~ l , K, W , V ) Zi is a Markov chain. 

At this point, we are after an upper bound to Yli=l ^G^i> Y i\^ii Y 1 " 1 , K, W N , V N ), subject to 
the fact that 

NR u]v (D + e) < I(X n ;Y n \K,V N ,W N ) 

n 

= Y,i H ( Y i\ Y ^^ K ^ vN ^ WN ) ~ H(Y i \X i ,Y i - 1 ,K,V N ,W N )} 

8=1 
n 

= ^IiXiiYilY*- 1 ,^ 1 *^") (31) 
i=l 

where, once again, the first inequality has been proved already in (|23p . For given k,v N , w , and 
y* -1 , i = 1,2, ...,n, let 

a, (fc, A *A y- 1 ) = /(X,; Y|K = k, V N = v N , W N = w N , Y^ 1 = jT 1 ) (32) 

and 

(3i(k, v N ,w N , y 1 " 1 ) = E{<p(X t )\K = k, V N = v N , W N = w N , Y l ~ l = y*' 1 }. (33) 
Obviously, by definition of the function T, 

I{Xi- Y\Zi, K = k,V N = v N , W N = w N , Y 1 - 1 = y 1 - 1 ) 
< T(a t (k,v N ,w N ,y^ 1 ),p t (k,v N ,w N ,y i - 1 )). (34) 
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Thus, 

n 



i=l 
n 



= E E MK = k,V N = v N ,W N = w N ,Y*- 1 = y*- 1 }x 

i=l k,v N ,w N tf- 1 

/(X,; Yi\Zi, K = k, V N = v N , W N = w N , Y l ~ l = y^ 1 ) 

n 

^ E E Pr{K = k,V N = v N ,W N = w N ,Y i - 1 = y i - 1 }x 

i=l k,v N ,w N ,y i ~ 1 

T( ai (k, v N ,w N , y 1 " 1 ), Pi(k, v N ,w N , y 1 - 1 )) 

(a 

nT | 

n 



ai / i 

^ nF U^ E Pr{^ = fc,y 7V = W 7V ,W 7V = u; 7V ,y i - 1 = y i - 1 }-« 4 (^^,^,y i ~ 1 ), 

\ «=1 k,v N ,w N ,y i ~ 1 

-J2 E = k,V N = v N ,W N = w N ,Y 1 - 1 = y i - l }-(3 i (k,v N ,w N ,y 1 - 1 ) 

i=l k,v N ,w N ,y i - 1 
I n n \ 

nr - V/^;^!^- 1 ,^^,^),- Vi<WQ)} 



(b) fN 

< nT[ — -R ulv (D + e),Q + e 



n 



( f ,v (A + £)r (^±i), Q + £ ), (35) 

where (a) follows from the concavity of T(r, g) jointly in both arguments^ together with its non- 
increasing monotonicity in r and non-decreasing monotonicity in q, (b)- from (|3ip and the non- 
increasing monotonicity of the function r(-), and (c) and (d) - from the postulate that the band- 
width expansion factor of the codec does not exceed A + e. Combining eqs. ([25]) . ([26]) . (p7|) . (f28j) . 



2 This can readily be verified as a trivial extension of [111 Lemma 1] which accounts for the generalized power 
constraint. 
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29|) . (|30p . and (|35p . and using the arbitrariness of e with continuity considerations, we get 
A < min Ih(U\W), R + I(U; V\W) + H(U\V, W) + Ar ; Q^j _ ify v (X)) j 



min <^ £T(17|W), IT(17|W) + + Ar 



A 



,Q)-R ulv (D) 



H(U\W) 



Ru\ v {D) - XT 



Ru\v(D) 



A 



Q) -R 



A*(X,R,D,Q), 



(36) 



which establishes the converse part of Theorem 1. 
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5 Proof of the Direct Part of Theorem 1 



We begin with the following chain of equalities and inequalities: 

iVA = H(U N \W N ,Z n ) 

= H(U N ,Z n \W N ) - H{Z n \W N ) 

= H{U N , Z n , X n , K\W N ) - H(X n , K\U N , Z n , W N ) - H{Z n \W N ) 
= H(X n , K, U N \W N ) + H{Z n \X n , K, U N , W N ) - 

H(X n , K\U N , Z n , W N ) - H(Z n \W N ) 
= H(Z n \X n ,K,U N ,W N ) + H(U N \W N ) + 

[H(X n , K\U N , W N ) - H(X n , K\U N , Z n , W N )] - H(Z n \W N ) 
= NH(U\W) + H(Z n \X n ,K,U N ,W N ) + 

I{X n ,K;Z n \U N ,W N ) - H{Z n \W N ) 

(a) 

> NH{U\W) + H{Z n \X n ) + 

I(X n , K- Z n \U N , W N ) - H{Z n ) 
= NH{U\W) + I(X n , K- Z n \U N , W N ) - I(X n ; Z n ) 
= NH{U\W) + I(K- Z n \U N , W N ) + I(X n ; Z n \U N , W N , K) - I(X n ; Z n ) 
= NH{U\W) + H(K\U N , W N ) - H(K\U N , W N , Z n ) + 

I(X n ; Z n \U N , W N , K) - I(X n ; Z n ) 
© NH(U\W) + H(K)- H(K\U N ,W N ,Z n ) + 

I{X n - Z n \U N , W N , K) - I{X n - Z n ) 

(c) 

W NH(U\W) + NR-H(K\U N ,W N ,Z n ) + 

I(X n ;Z n \U N ,W N ,K)-I(X n ;Z n ), (37) 

where (a) follows from the fact that (K, U N , W N )QX n Q Z n is a Markov chain, (b) - from the fact 
that K is independent of (U N , W N ), and (c) - by assuming that H(K) = NR. While this chain of 
equalities and inequalities holds for any codec, then in order to proceed, we will have to be specific, 
from now on, about the structure and the properties of the codec. In particular, referring to the 
right-most side of the above lower bound to NA, then in order to prove the direct part, we will have 
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to prove that for our proposed codec (and as long as R is not too large): (i) H(K\U N ,W N , Z n ) 
is small, (ii) I(X n ;Z n ) is essentially smaller than nI(X;Z), and (iii) I(X n ; Z n \U N , W N , K) is 
essentially larger [nI(X;Y) — NRjj\y(D)], where in (ii) and (iii) the distribution of the random 
variable X is the achiever of F(R u i v (D)/X 1 Q). 

Fix an arbitrarily small e > 0, and let D satisfy Rmy(D) < XCx^y — e> where Cx—>y is the 
capacity of the channel Py\x- Given such D and e > 0, let X* denote the channel input variable 
that achieves T((Ru\y (D) + e)/\,Q). Let Y* and Z* denote the channel output variables induced 
by X* and the channels Py\x an d Pz\Yi respectively. Thus, 

I{X*;Y*) - I(X*; Z*) = T ( Ru \ v ^ + \ q\ (38) 



and 



Ru\v(D) + e 

I(X*;Y*) > UlV \ ' . (39) 

A 



Let us further suppose now that for the resulting optimal RV's X*, y*, and Z*, we have: 

Ru\v(D) 



X 



> I(X*;Y*) -I(X*;Z*). (40) 



In the sequel, we will handle separately the case where (I40p does not hold. Further, let T n denote 
the set of n _1//4 -typical n-sequences with components in X, i.e., the set of sequences for which 
the relative frequency of each x G X differs from Px*{x) by no more than n -1 / 4 . The following 
lemma, which is Lemma 8 of jllj . guarantees that if the encoder is such that, with high probability 
X n € T n , then condition (ii) above is essentially satisfied: 

Lemma 1 fU\ Lemma 8] Let X n and Z n be induced by an aribtrary encoder and the cascaded 
channel from X n to Z n : 

L(X n ; Z n ) 
n 

where fi(n) — > as n — > oo. 



< I(X*; Z*) + Pr{X n G T n c } • log \X\ + h{n), (41) 



Note that whenever X n E T n , the generalized power constraint is also essentially satisfied. 
It remains to handle conditions (i) and (iii). Consider next the encoder and the decoder of the 
legitimate receiver, depicted in Fig. [2j The source vector U is first compressed by a Wyner-Ziv 



22 



jjN 


Wyner-Ziv 
Sonrcp 
Encoder 










A 










fjN 


Wyner-Ziv 
Source 
Decoder 









Channel 
Encoder 



X 7 



Y\X 



Channel 
Decoder 



Y 1 



yN 

Figure 2: Encoder and decoder for the direct part. 

encoder, designed for distortion level D and side information V N , to a string of bits, S = Fe(U n ), 
whose length does not exceed N[Ru\ v {D) + e/2] < n[I(X*;Y*) - e/(2A)]. Now, let us select R in 
the range 

< R < Ru\v(D) - X[I(X*;Y*) - I(X*- Z*)] - e, (42) 

where the right-most side is positive due to (|40p . The key K is a string of NR purely random bits, 
which are XORed with (the first) NR bits of S (one time pad). The resulting (partially) encrypted 
bit string, T, which will be represented by T = S © K = Fe(U n ) © K (although it is possible that 
only some of the bits of S are XORed with those of K ) , is the message to be conveyed across the 
channel. Now, let 

qt ^ Pr {T = t}, t = l,2,...,M = 2 7V ^l^( D ) +£ / 2 ]. (43) 

Next, let Mi = M 2 M, where M 2 is a positive integer to be specified in the sequel. Let {x m } m Li be 
a subset of X n , which can be viewed as a code for the channel Py\x or Pz\x- The channel encoder 
and decoder in Fig. [2] work as follows. They both share a partition of {SmJ^ij into M sub-codes, 
Ci,C 2 , • • • ,Cm, each of size M 2 . Let C t = {«(j-i)M 2 +i; • • • , x iM 2 }, t = l,2,...,M. When T = t, the 
channel encoder outputs a vector X n which is a (uniformly) randomly chosen member of sub-code 
C t . Thus, for t = 1,2, ... ,M, r = 1,2, ... ,M 2 , 

Pr{X n = x {t _ 1)M2+T \T = t} = ^- (44) 
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and 



Pr{X 



ri 



x (t-l)M 2 +r} 



Qt 
M 2 



(45) 



As mentioned earlier, the set {x m } m ^ =1 can be thought of as a code for the channel Py\Xi where 
the prior probabilities of the codewords are given by (|45H . Let T' = G(Y n ) denote the Bayes- 
optimal decoder for this code and these prior probabilities, which estimates the index t of the 
sub-code Ct that contains the transmitted codeword X n . Let 5 = Sy(x±, . . . ,xm x ) = Pr{T' ^ T}. 
Obviously, if 5 is small, namely, if T' = T with high probability, then the Wyner-Ziv decoder 
U N = Fd(S' ,V N ) = Fd(T' ® K, V N ) would output the "correct" reconstruction vector within 
distortion D, with the same probability. 

Next, observe that each sub-code Ct may serve as a channel code for the degraded channel 
Pz\Xi provided that the corresponding decoder is informed of t. Let St = 5z(Ct), t = 1, 2, . . . , M , 
denote the error probability of code Ct w.r.t. the channel Pz\x when the decoder that observes Z n 
is informed of t. Finally, let 5 = X]?=i Qt^t- With these definitions, we next make our first step to 
handle condition (iii). 

Let U N and K be such that T = Fe(U n ) © K = t. Then, the channel input, given T = t, is 
distributed according to (|44p . that is, X n is a randomly chosen member of Ct, Thus, H(X n \T = 
t) = logM2. Since St is the probability of error associated with Ct, Fano's inequality yields: 



where h(-) is the binary entropy function h(a) = —a log a — (1 — a) log(l — a). It follows then that 



H(X n \Z n ,T 



t) < h{S t ) + S t log M 2 < 1 + S t log M 2 



(46) 



I(X n ;Z n \T = t) > (l-<5 t )logM 2 -l 



(47) 



which upon averaging over {t} with weights {qt}, yields 



I{X n ;Z n \T) > (l-<5)logM 2 -l. 



(48) 



On the other hand 



I(X n ;Z n \T) 



H(Z n \T) - H{Z n \X n ,T) 



= H{Z n \T,U N ,W N ,K) - H{Z n \X n ,T,U N ,W N ,K) 
< H{Z n \U N ,W N ,K)-H{Z n \X n ,U N ,W N ,K) 
= I(X n ;Z n \U N ,W N ,K) 



(49) 
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where the second equality is due to the Markov relation (U N , W N , K) © T X n © Z n . Thus, we 
have established the inequality 

I(X n ;Z n \U N ,W N ,K) > (1 -S)\ogM 2 - 1. (50) 

In the sequel, we will choose M 2 so as to meet condition (iii). 

We next move on to handle condition (i). For every S = s, let C' s denote the union of all 
codebooks {Ct,t = s © &}^ =0 -1 ) and let 5' s = Sz(C' s ) denote the error probability of C' s w.r.t. the 
channel Pz\x when the decoder is informed of s. Let 

p s ^Pr{S = s}, s = l,2,...,M. (51) 

Finally, let a = Y^ s PsK- With these definitions, let us now derive an upper bound on H(K\U N , W N , Z n ) 

H(K\U N ,W N ,Z n ) < H(K\U N ,F E (U N ),Z n ) 

= H(K\U N ,S,Z n ) 

< H(S®T\S,Z n ) 
= H(T\S,Z n ) 

M 

= ^p s H(T\S = s,Z n ) 

s=l 
M 

< Y J Ps[h(S' s ) + S' s log(2 NR M 2 )} 

s=l 

< l + 5'(NR + \ogM 2 ), (52) 

where the third inequality is again Fano's inequality, and where we have also used the fact that 5' s 
is an upper bound of the probability of error in estimating T, since T is only the index t of the 
codebook Ct to which the estimated codeword belongs. 

To summarize our findings thus far, we substitute eqs. (|52p . (|4ip and (|50p into (|37p . divide by 
N, and get: 

a > „ + fl + (i-J).!^-|-?(« + ^)- 

\{I(X';Z') + Pr{X» 6 T„ c } ■ log |Af | + /,(«)]. (53) 

Now, let us select 

Mi = 2"[ / ( x * ;y *)- e /( 2A )l (54) 
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and 



Mo = — = 2 n[I{x *' Y " ) - Ru \ vi - D),x ~ e/x] (55) 
M ' y ' 



Applying this to (|53j) . we get 

A > H(U\W)+R+{l-5)[XI{X*-Y*)-R u \ v {D)-e]-^-~5'{R + \og\X\)- 
A[J(X*; Z*) + Pr{X™ G T n c } • log |*| + f x (n)} 
> H(U\W) + R + \[I(X* ;Y*) - I(X* ; Z*)] - R U]V (D) - 

U+^ + (5 + 5')(R + log \X\) + A[Pr{X" e T n c } • log |*| + f x [n)}\ 

= H{U\W) + R + XT f Ru \ v ^ + \ q\- Ru\ v {D) - 

|e + | + (6 + 5')(R + log [Af|) + X[Pr{X n e T n c } • log |Af | + A(n)]} . (56) 

Finally, to prove that the expected distortion of U relative to U N is essentially D, and to prove 
that A essentially meets the upper bound A* (A, R, D, Q) (namely, that the last term on the right- 
most side of f|56[) is arbitrarily small for large N), we have to prove the existence of a code {<Em}m=i 
for which 5, 5, 5' and Pi{X n S 7^} are all simultaneously arbitrarily small for large N. 

To this end, let us define fj,(x) = l{x € T*}, and for a given code {a?m} m =i> let Sy . . . , imJ 

denote the error probability w.r.t. the channel Py\x with prior probabilities {qt} as given in (|45|) . 

when x m is transmitted. Then, 

M-l 2 NR -1 (s©fc+l)M 2 
5= E^ E ^NR J2 W S^( Xl ,...,X Ml ). 

s=0 k=0 m=(s®k)M 2 +l 

Further, let 5t = 5z(Ct), 5' s = 5z(C' s ), 6, and a be defined as above. Then, 
$>(x u ...,x Ml ) = Pr{X n € T^} + 5 + 5 + 5' 

M-l 2 NR -1 (s©fc+l)M 2 
= Yj Ps Yl ^NR Yl —Hx m )+5^(x 1 ,...,X Ml ) + 

s=0 k=0 m=(sefc)M 2 +l 

5z(C s(Bk )+5 z (C' s )}. (57) 

Now, suppose that {xm}^^ are selected at random, with each x m chosen independently according 
to PjJ*(x n ) = Ilj=i Px*{xj)- To prove that there exists a sequence of codes for which $ — > as 
n — > oo, all we have to show is that _E<I> — > 0. But 

E<S> = En{X) +E5 Y l (X 1 ,..., X Ml ) + E5 z (C s(Sk ) + E5 z (C f s ), 
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where the indices s, k, and m £ {(s © k)M 2 + 1, . . . , (s © k + 1)^2} are now immaterial. The 
first term tends to zero by the weak law of large numbers. The second term tends to zero by the 
ordinary random channel coding argument as the rate of the code {a; m }^ =1 is less than I(X*;Y*) 
(cf. the choice of M\ above). By the same token, the fourth term vanishes with n, as C' s is a random 
code of size 2 NR M2, and so its rate (cf. (fl2j) ) is 

R/X + I(X*;Y*) - R ulv (D)/X < I(X*; Z*) - e/A, 

which means that it is reliable for the channel Pz\x on t ne average. A-fortiori, the third term decays 
with n as Ct is even a smaller random code. By a simple application of the Chebychev inequality, 
with probability of at least |, the random of selection of the code yields . . . , XmJ < 3E&, 

which is still vanishingly small. On the other hand, since the codeword components are selected 
i.i.d. under Px*, then by the weak law of large numbers, for every e > and large enough n and 
Mi, we have, with probability that tends to unity, and in particular, larger than ^ from some point 
on: 

M M 2 n 

E E W 2 ' n £ ^ X (t-l)M, + r]m <Q + e (58) 

t=l r=l j'=l 

where [X u_i^ 2+T ](j) is the j-th component of the codeword -X"(t_i)Af 2 + T . Since | + ^ > 1, it 
follows then that there exist codes for which both <&(xi, . . . ,xm ± ) < (and hence all components 
of $ must be small) and the power constraint (|58|) holds at the same time. 

The maximum secrecy of H(U\W) is, of course, approached by letting R be arbitrarily close to 
(but strictly smaller than) Ru\ v (D) — \T{R u \ [ y{D) / X). 

Finally, for completeness, we give a sketchy description of how the proof of the direct part 
should be slighlty modified in the (simpler) case where eq. (|40p does not hold, namely, 

RuwiD) 

l . < I(X*;Y*)-I(X*;Z*). (59) 
A 

Note that in this case, the achievable upper bound on A, asserted in Theorem 1, becomes H(U\W) 
even for R = 0, as the bracketed term therein is non-positive. In the case, we will not use the key 
at all, i.e., R = and K is degenerate. Thus, (i37|) becomes now: 

NA > NH{U\W) + I(X n ; Z n \U N , W N ) - I{X n ; Z n ). 



27 



As before, I(X n ; Z n ) is essentially upper bounded by nI(X*; Z*) using Lemma 1, and so, we only 
have to deal with the term I{X n ; Z n \U N ,W N ) and show that it is essentially lower bounded by 
nI(X*\ Z*). To this end, let us re-define Mi as 

Mi = 2 NR UW(D)+n[I(X*;Z*)-e/(2X)] 

and M as before, so, 

Mo = — = <y^(x*-,z*)-e/\] 
M 

Now, since NR V \ V {D) + nI(X*; Z*) < nI{X*;Y*) (cf. ([55]!). the full codeword « m can be reliably 
decoded at the legitimate decoder, as before. Also, since each sub-code C% is, again, of rate less 
than I(X*; Z*), then it can be decoded reliably by the wiretapper, provided that s/he is informed 
of t, thus I(X n ;Z n \U N ,W N ) is again, essentially lower bounded by logM 2 = n[I(X*;Z*) - e/A]. 
This completes the proof of direct part of Theorem 1. 
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